home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Games of Daze
/
Infomagic - Games of Daze (Summer 1995) (Disc 1 of 2).iso
/
x2ftp
/
msdos
/
docs
/
winer
/
chap4.txt
< prev
next >
Wrap
Text File
|
1994-09-04
|
60KB
|
1,054 lines
CHAPTER 4
DEBUGGING STRATEGIES
There are many individual components which contribute to a completed
application. The logical flow of the program must be determined, the user
interface must be designed, and appropriate algorithms must be selected.
But no matter how much effort you devote to the design and implementation
of a program, the bottom line is it must also work correctly.
In an ideal scenario, you would begin writing a program by first
jotting down some notes that describe its operation. Next, you would
create an outline listing each of the program's major components. You
would then determine all of the subroutines and functions that are needed,
and perhaps even create a flow chart showing each of the paths that could
be taken. Properly prepared for any situation that might arise, you
finally write the actual code and find that it works perfectly. Now,
what's wrong with this picture? Few people actually program that way!
In practice, many programmers simply start coding with little
forethought and no detailed plan. They begin with the first statement and
continue to the last, occasionally reworking portions into subroutines as
necessary. After all, planning is not nearly as much fun as programming,
and everyone knows that fun is the most important part. Believe it or not,
I agree. There's nothing really wrong with plodding through a program,
stabbing here and there until it works. Indeed, some great algorithms
developed out of aimless doodling. I have personally never drawn a flow
chart, and I have no plans to start now.
What I will address here is how to find and correct problems when they
do occur. There are more things that can go wrong with a program than can
go right, and tracking down an elusive "Illegal function call" error that
appears only occasionally is definitely not much fun. How quickly you can
solve these problems is directly related to your understanding of
programming in general, and to your familiarity with the tools available.
In this chapter you will learn how to identify problems in your
programs, and also how to solve them. Programming errors, or bugs, can be
as simple as a misspelled variable name, and as complex and ornery as an
internal flaw in BASIC itself. The BASIC editing environment provides a
wealth of powerful debugging features, and understanding how to use them
will help you produce programs that are reliable and error free.
COMMON PROGRAMMING ERRORS
=========================
There are three distinct types of programming errors: simple misspellings
and other naming or syntax errors, incorrect logic such as misunderstanding
or incorrectly coding an algorithm, and failing to understand some of the
finer points of the BASIC language. No matter how carefully you type, no
matter how much forethought you apply to a particular problem, and no
matter how often you read the BASIC manuals, it is impossible to completely
avoid making mistakes.
The first category includes those errors caused by simple mistakes
such as misspelling a variable or procedure name. Trying to call a
subprogram that doesn't exist will be immediately obvious, because BASIC
gives you an error message before the program can be run. But an incorrect
variable name will return the wrong results with no warning.
Passing the wrong number of arguments to a procedure may or may not be
reported, depending on whether the routine has been declared. Assembly
language routines in a Quick Library can be particularly pesky in this
regard. Although BASIC automatically generates a DECLARE statement for
BASIC subprograms and functions you have loaded in source form, it does not
do this for routines in a Quick Library. If you call an assembly language
routine incorrectly, you will probably crash the PC. However, it is also
possible to corrupt string memory and not know it. Worse, a "String space
corrupt" error is often not reported until much later in the program. If
you run the short program below in the QuickBASIC 4.5 editor, it will
appear to operate correctly.
X$ = SPACE$(1000) 'create a string
POKE SADD(X$) - 2, 100 'corrupt string memory
PRINT "Testing"
X% = 1
PRINT "More testing"
X% = 2
PRINT "Yet more testing"
X% = 3
Here, the POKE statement is overwriting the back pointer that belongs to
X$, which is one type of string corruption that can occur. But QuickBASIC
doesn't know that this has happened, because it has no reason to check the
integrity of its string memory until another string assignment is made.
However, adding the statement PRINT FRE("") anywhere after the POKE command
causes BASIC to check string memory, and report the error. Even if your
program does not use POKE, calling a procedure incorrectly can cause it to
overwrite memory in this fashion.
Another simple error is inadvertently using the same variable name
twice, or omitting a type declaration character from a variable name. For
example, if you are using a variable named Bytes& to track how many bytes
of a file have been read, accidentally using Bytes later on will give the
wrong results. If a DEFINT statement is in effect, then Bytes will be an
integer variable. Otherwise, it will be single precision which is also
incorrect. Unless you use the DIM...AS statement to declare a variable
explicitly, BASIC lets you have different variables with the same name.
That is, Var%, Var!, and Var# can all coexist in the same program, and each
is a unique variable.
Similarly, using the wrong variable entirely will cause your program
to operate incorrectly, and again with no error message displayed. More
than once I have had a program with one FOR loop nested within another, and
used the outer loop counter variable when I meant to use the inner one.
Another common situation is caused by changing the name of a variable
during the course of writing a program. For example, you may have a
variable named BPtr that tracks where you are reading within a buffer. If
you later decide to change that name to BufPointer because it is more
meaningful, you must also remember to change all occurrences of the name.
Of course, BASIC's search and replace feature minimizes that problem. More
important, though, you must make a mental note to use the new name as you
continue to develop the program.
Forgetting to declare a function can also lead to incorrect results
that produce no warning. If an integer function is not declared, then
BASIC will dimension an array with that name if the function expects a
numeric argument. When BASIC encounters the statement X = FuncName%(Y%) it
assumes that FuncName% is an integer array, and create an array containing
the default 11 elements. In this case X will be assigned a value of zero,
or you will receive a "Subscript out of range" error if Y% is not between 0
and 11. I once observed an unexplainable "Out of string space" error that
was caused by the statement Size = ScreenSize%(ULRow, ULCol, LRRow, LRCol).
ScreenSize% was a function present in a Quick Library, but without a
DECLARE statement BASIC created a 4-dimensional integer array.
LOGIC ERRORS
============
The second cause of bugs is logic errors, and these include adding when you
meant to subtract, or using the wrong variable altogether. Programs that
manipulate pointers (variables that hold the addresses of other variables)
are particularly prone to errors in logic. Another common logic error is
forgetting to trim the leading or trailing blanks from a file or directory
name before using it. If the operator enters " c:\thisfile.dat" and you
try to open that file, BASIC will report a "Bad file name" error.
Another cause of logic errors is failing to consider all of the things
a user may enter. An inexperienced operator is likely to enter data that
you as the programmer would never consider, or select menu items in an
order that makes no sense. Indeed, never underestimate the value of beta
testers. After you have exhausted all of the possibilities you can think
of, give the program to a 4 year old child, and ask him or her to try it
while you watch. Your uncle Ernie would be a good beta tester too, and the
less he knows about your program, the more valuable his contribution will
be. People who know absolutely nothing about computers have an uncanny
knack for creating "Illegal function call" errors in a program that you
just know is perfect.
Similarly, you must consider all of the possible error conditions that
could happen in a program. In an error handler that has a CASE statement
for each possibility you anticipate, also include a CASE ELSE clause for
those you haven't thought of. The short listing that follows shows a
typical error handler that incorporates this added safety measure.
ON ERROR GOTO HandleErr
...
...
HandleErr:
SELECT CASE ERR
CASE 7, 14
PRINT "Out of memory"
CASE 24, 25, 27
PRINT "Fix the printer"
CASE 53
PRINT "File not found"
CASE ELSE
PRINT "Error number"; ERR
END SELECT
...
...
The CASE ELSE clause lets you accommodate any possibility, and your user
can then at least report to you what the error number was. This simple
example doesn't include all of the possibilities, but you can certainly see
the general concept.
Another common logic error is using the same file number twice. When
a file has been opened as #1, that number remains in use until the file is
closed. This can be problematical when writing reusable modules, since
there is no way to know which files may be in use by the main program.
Some programmers use #99 or another unlikely number in a routine that will
be reused in many programs. But even that approach is flawed, because you
have to remember which numbers are used by which routines.
BASIC's FREEFILE function is intended to solve this problem, and it
returns the next available file number. Be sure to save the results
FREEFILE returns, however, since the value will change as soon as the next
file is opened. The code below shows both the wrong and right ways to use
FREEFILE.
Wrong:
OPEN "accounts.dat" FOR INPUT AS #FREEFILE
INPUT #FREEFILE, X$ 'FREEFILE has changed!
CLOSE #FREEFILE
Right:
FileNum = FREEFILE 'get and save the number
OPEN "accounts.dat" FOR INPUT AS #FileNum
INPUT #FileNum, X$
CLOSE #FileNum
In the first example if FREEFILE returns, say, a value of 2, then it will
return 3 at the INPUT statement which is of course incorrect. Therefore,
you must save the value FREEFILE returns, and use that for all subsequent
file accesses. This situation also occurs with INKEY$, because once a
character has been returned it is no longer available unless you saved it.
Two other frequent problems are attempting to use LSET to assign
characters into a string that does not exist, and failing to clear a
counter variable within a static subprogram or function. The second
problem can be especially frustrating, because the routine will work
correctly the first time it is invoked. In the function below, a counter
returns the number of embedded control characters it finds in a string.
FUNCTION CtrlCount%(Work$) STATIC
FOR X% = 1 TO LEN(Work$)
IF ASC(MID$(Work$, X%, 1)) < 32 THEN
Count% = Count% + 1
END IF
NEXT
CtrlCount% = Count% 'return the count
END FUNCTION
The problem here is that Count% retains its value between function
invocations. Therefore, each time CtrlCount% is used it will return ever
higher values. One solution is to add the statement Count% = 0 at the
beginning of the function. Another is to omit the STATIC option from the
function definition.
UNDERSTANDING BASIC'S QUIRKS
The third type of error is caused by not understanding some of BASIC's
finer points and quirks. For example, some people do not realize that
omitting the third argument from MID$ causes it to return all of the
remaining characters in a string. To see if a drive letter was given as
part of a file name and if so extract it, you might use a statement such as
IF MID$(FileName$, 2) = ":" THEN Drive$ = LEFT$(FileName$, 1). But since
the number of characters was not specified to MID$, it returned all but the
first character in the string. Unless the string was a drive letter and
colon only ("C:"), the test for a colon could never work. The solution, of
course, is to use MID$(FileName$, 2, 1).
Another instance in which an intimate knowledge of BASIC's
idiosyncracies comes into play can affect the earlier example of a file
name that contains leading blanks. Most programmers do not use INPUT to
accept information, unless the program is very simple and it will be used
only occasionally. However, asking for a file name with INPUT is one way
to avoid that problem, because INPUT strips all leading and trailing blank
spaces, as well as CHR$(9) tab characters. The more useful LINE INPUT, on
the other hand, does not strip leading blanks and tabs. Most programmers
would never be so foolish as to enter a file name with leading blanks. So
this is yet another situation where it is important to consider all of the
possibilities.
It is also possible to crash a program by using the ASC function when
the string might be null. Again, *you* would never press Enter alone in
response to a prompt for a file name or other mandatory information, but
someone else might.
Another BASIC quirk is caused by rounding errors. As you saw in
Chapter 2, adding or multiplying many numbers in succession can produce
results that are not precisely correct. Instead of checking to see if a
value is zero, it is often better to compare it to a very small number.
That is, instead of IF Value# = 0 you would use IF Value# < .000001 or IF
Value# < .000001 AND Value# > -.000001 or something similar. Also, some
numbers simply cannot be represented at all. If you try to enter the
statement X# = .00000000001 in the QuickBASIC 4.5 editor, the value will be
converted to 9.999999999999999D-12 as soon as you press Enter.
Although not technically a BASIC quirk, many programmers forget that
variables within a DEF FN function are by default global. Unless you
include an explicit STATIC statement listing each variable that is to be
local to the function, it is likely that an unexpected change will be made
to a variable in the main program.
Some programming situations require that you obtain the address of a
string variable using SADD. However, SADD is not legal for use with a
fixed-length string or the string portion of a TYPE variable. More
important, when using BASIC PDS far strings you must also remember to use
SSEG to get the string's data segment. Using VARSEG will not create an
error; however, the program will not work correctly.
Related to that, it is important to remember that strings and dynamic
arrays move around in memory--often at unexpected times. The program below
appends a zero character to one string for each zero that is found in
another string. Since BASIC may move Work$ during the course of assigning
Zero$, this code will fail eventually:
Address = SADD(Work$)
FOR Y = Address TO Address + LEN(Work$) - 1
IF PEEK(Y) = 48 THEN Zero$ = Zero$ + "0"
NEXT
Another particularly insidious bug can result if you inadvertently add
parentheses around a variable that is passed to a subprogram or function.
In the example below, a subprogram that intentionally modifies a parameter
has been declared and is then called without the CALL keyword.
DECLARE SUB Square(Param%)
Square (Value%)
SUB Square(Value%) STATIC
Value% = Value% * Value%
END SUB
Because of the unnecessary and incorrect use of parentheses, a copy of the
argument is sent to Square instead of the argument itself, with the result
that Value% is never actually changed. The fix is to either remove the
parentheses, or add the word CALL. Another, related issue is placing a
DEFINT after DECLARE statements. In the example below, the parameters X,
Y, and Z are assumed by BASIC to be single precision, even though this is
clearly not what was intended.
DECLARE SUB (X, Y, Z) 'X, Y, and Z are singles!
DEFINT A-Z
.
.
The final issue I want to address here is potential overflow errors. The
statement IF IntVar% * 14 > 1000000 can never be true, because BASIC
performs integer math assuming an integer range only. Unless you compile
your program using the /d debug option, the error will be unreported in a
compiled program. If this statement is executed within the QB environment,
BASIC will report an overflow error, even though the instruction certainly
appears to be legal. But since integer math assumes an integer result, the
product of IntVar% times 14 will overflow the range of integer values if
IntVar% is greater than 2,340.
One solution is to use a long integer for IntVar, and BASIC will then
use the range of long integers for the comparison. Using a long integer
wastes memory, however, and calculations on long integers are slower and
require more code to implement. A much better solution is to use CLNG
(Convert to Long), which tells BASIC to assume a long integer result.
The statement IF CLNG(IntVar%) * 14 > 1000000 will create a long
integer version of IntVar%, and then multiply the result times 14 and use
that for the subsequent comparison. Unlike the copies that BASIC makes
which steal DGROUP memory, the long integer conversion in this instance is
handled within the CPU's registers. CLNG when used this way is really just
a compiler directive, as opposed to a called library routine. Another
solution is to add an ampersand after the constant 14, thus: IF IntVar% *
14& > 1000000. Again, no additional DGROUP memory is used to handle 14 as
a long integer value.
Another interesting use of CLNG and CINT--unrelated to debugging but
worth mentioning none the less--is to reduce the size of comparison code.
When you use a statement such as IF X% > VAL(Some$), a floating point
comparison is performed even if Some$ holds an integer value. By replacing
that example with IF X% > CINT(VAL(Some$)) 6 bytes of code can be saved.
The CINT tells BASIC that it will not have to perform any floating point
rounding when it compares the two values.
DEBUGGING AND TESTING TECHNIQUES
================================
When you are developing a large application that is comprised of many
individual modules, there are several useful debugging techniques you can
employ. One is to create short test-bed programs that exercise each
subprogram and function. Finding an error in a complex program with many
interdependencies between subroutines can be a tedious prospect at best.
If you instead create a small program whose sole purpose is to test a
particular subprogram, you will be better able to focus on just that
routine.
Another useful technique for detecting and preventing sporadic errors
is to test your code on "boundary conditions". If you have a routine that
reads and process a file in 4K (4096 byte) increments, test it with a file
that is exactly 4096 bytes long, as well as with other test files that are
4095 and 4097 bytes long.
Perhaps nothing is more frustrating than having a program fail with
the message "xxx at line No line number". This message is a throw-back to
the days when all BASIC programs had to use line numbers. Now that line
numbers are not required in modern compiled BASIC, most programmers do not
use them, opting instead for more descriptive line labels when labels are
needed at all. When an error does occur and the program has been compiled
with /d, BASIC reports the number of the nearest numbered line preceding
the line in which the error occurred.
A good solution to track down the cause of such errors is to use a
variant on a hardware debugging technique known as the "cut in half"
method. In a complex electronic circuit that does not work, using this
technique means that the circuit is first checked at its mid-point for the
correct signal. If the circuit tests correctly at that point, then the
error is in the second half. Therefore, the test engineer would "cut in
half" again, and test at a point halfway between the middle and the end.
If the test fails there, then the problem must lie between the middle of
the circuit and that point.
In a purely software situation, you would add a line number to a line
that falls approximately half-way through the program. If that number is
reported, then the problem is occurring in the second half of the program.
An enhancement to this technique that I recommend is to add, say, ten line
numbers in evenly spaced increments throughout the program. This will let
you quickly isolate the problem to a much smaller portion of the program.
Besides the line number (or lack of line number) that BASIC reports,
the segment and address at which the error occurred is also reported. This
is information is frankly useless in a purely BASIC environment. You must
either use CodeView to identify the line that is associated with the error,
or view the assembly language output that BC can optionally generate.
These will be described in the section on advanced debugging later in this
chapter.
Finally, it is important to point out that you should never use ON
ERROR while a program is being developed. ON ERROR can hide programming
errors that you need to know about. As an example, a LOCATE statement with
incorrect values will generate an "Illegal function call" error. But if ON
ERROR is in effect and your program uses RESUME NEXT for errors it is not
expecting, you may never even know that an error occurred. If you run the
complete program below you can see that there is no indication that an
error occurred at the obviously illegal LOCATE statement.
CLS
ON ERROR GOTO HandleErr
LOCATE 100, -90
PRINT "My program seems to work fine."
END
HandleErr:
RESUME NEXT
USING THE QB AND QBX EDITING ENVIRONMENTS
The single most powerful debugging feature that is available to you is the
BASIC editing environment. More than just an editor that you can use to
enter program statements, the QB environment is exactly that: a complete
editing environment for developing and testing BASIC programs. The BASIC
editor lets you enter program statements, single-step through a program,
examine variable values, and much more. Besides being able to execute
commands singly and in sequence, you can also trace into subroutines and
functions, and even run your program in reverse.
The primary advantage of using the QB environment instead of a
separate editor is the enhanced debugging capabilities. In most high-level
languages, you first write a program using an editor, and then compile and
run it to see if it works correctly. If an error occurs, you must start
the editor again, load your program, and study the code to see what went
wrong. In contrast, QB lets you run your program at the same time it is
being edited. You can even modify the program while it is running and then
resume execution, view and change variable values, and change the order in
which statements are executed.
Further, BASIC can be instructed to stop and return to the edit mode
when the program reaches a certain statement, or when a particular logical
condition becomes true. For example, you can tell BASIC to halt the
program when a variable takes on a specified value. These are extremely
powerful debugging tools which have no equal in any other language. In the
sections that follow, I will describe each of these capabilities in detail.
STEP AND TRACE DEBUGGING
Early versions of Microsoft BASIC offered a very primitive trace capability
that displayed the line numbers of the currently executing statements.
Although this was better than nothing, interpreting a blur of line numbers
flashing by on the screen required a lot of mental effort. When Microsoft
introduced QuickBASIC version 3.0 they added greatly improved debugging in
the form of a step and trace feature. To activate step and trace you would
enter a STOP statement at a selected point in the source code. When the
program reached that point you could then execute each statement in
sequence by pressing a function key. QuickBASIC 3 also provided the
ability to display continuously the value of a single variable in a window
at the top of the screen.
QuickBASIC 4.0 offered an improved version of this feature, using
additional function keys to control how a program proceeds. This method
has been continued with little change through current versions of
QuickBASIC and BASIC PDS. Of course, the primary reason you would want to
step through a program one statement at a time is to determine why it is
not working. For example, if you have code that opens a file for output
but the file is never created, you would step through that portion of the
code to see which statements are being executed and which are not. In
particular, stepping through a program lets you see which path an IF or
CASE test is taking.
Two function keys are used to single-step through a program, and four
additional options are available to assist program debugging. Each time
the F10 key is pressed, the current statement is executed and the program
advances to the next statement. If you have just loaded the program being
tested, you will press F10 once to get to the first instruction. Pressing
F10 again executes that statement, and continues to the next one. If the
current statement is related to screen activity, the screen is switched
momentarily to display the program's output rather than the source code.
The screen is also switched during a CALL statement or function invocation,
in case that routine performs screen output. You can optionally toggle
between viewing the output and edit screens manually by pressing F4.
In some cases you may want to treat a subroutine as a single
statement, which is what F10 does. That is, CALL MySub is handled as
single statement, and all of the statements within the routine are executed
as one operation. In other cases, however, you may need to trace into a
subprogram, GOSUB routine, DEF FN, or function, to step through its
statements as well. This is what F8 is for. When F8 is pressed at a CALL
or GOSUB statement or function invocation, BASIC traces into the procedure
and lets you watch as it executes each statement individually.
Two additional capabilities let you navigate a program more quickly.
Pressing F7 tells BASIC to execute all of the statements up to the current
cursor location. This way, you are spared from having to watch a long
sequences of commands that you know are working correctly. For example,
stepping through a FOR/NEXT loop that initializes 1000 elements in an array
is usually pointless. Therefore, when you reach that spot in the program
you would manually move the cursor to the statement following the NEXT, and
press F7.
It is also possible to force execution to a particular point in the
program using the "Set next statement" option of the Debug menu. Unlike
F7, though, the statements that precede the selected line will not be
executed. Therefore, this option is equivalent to adding a temporary GOTO
to the program, causing it to jump to the specified line.
One of the most powerful features of the BASIC editor is that you can
actually modify your program, then resume execution. In earlier versions
of QuickBASIC, making even the slightest change to a program--even if only
to a single comment--the entire program would have to be recompiled. BASIC
can now preserve variable values and indeed the entire program state during
most types of editing operations.
The last important step operation I want to mention now is the History
feature. This too must be selected from a menu, and using it will slow
your program's operation considerably. When the History option is selected
from the Debug menu, BASIC remembers the last 25 program statements, and
lets you step through your program in reverse. For example, if a variable
has taken on an incorrect value, you can walk backwards through the program
to see what statements caused that to happen. Where F8 steps forward
through your program, Shift-F8 instead steps backward.
WATCH VARIABLES AND BREAK POINTS
As powerful as BASIC's single-step feature is, it is only half of the
story. Equally important is the Watch capability that lets you view a
program's variables in real time. One or more variables may be placed into
a special Watch window at the top of the editing screen, and their values
will be displayed and updated after each statement is executed. Between
the Step and Watch features, you can observe all aspects of your program's
operation as it is executing.
Besides watching variable values, you can also monitor complex
expressions and function results. For example, you could watch the value
of X% * Y% + Z%, ASC(Work$), or the result of a function such as
StrFunction$(Array$(), Count%). Because each variable or expression is
updated after every program statement, your program will run more slowly
when many items are displayed in the watch window. However, this is seldom
a problem in a debugging situation, and the ability to see precisely what
is happening far outweighs the minor speed penalty.
Being able to watch the results of expressions as well as simple
variables offers some useful and interesting techniques. As an example,
suppose you are watching a string variable named Buffer$. If Buffer$ is
very long, you can use LEFT$ or MID$ to watch just a portion of the string:
MID$(Buffer$, CurPointer%, 70). This expression displays the 70-character
portion of Buffer$ that is currently pointed to by CurPointer% (assuming,
of course, you are using variables with those names).
Likewise, if you are observing a string but nothing is showing in the
watch window, you could watch "{" + Work$ + "}". This displays "{}" if the
string is null, and shows if there are leading or trailing blanks or
CHR$(0) bytes. Adding braces also lets you see if the string contains
characters that begin past the edge of the visible window.
One particularly powerful use of BASIC's Watch capability is related
to the fact that all of the expressions are evaluated anew at each
statement. Earlier I mentioned how insidious "String space corrupt" errors
can be, because BASIC checks the integrity of its string memory only when a
string is being assigned. Therefore, watching the expression FRE(Any$)
tells BASIC to evaluate string memory after every source line. Thus, as
soon as string memory is corrupted it will be immediately reported. This
technique can be extended to identify a "Far heap corrupt" error as well,
by watching the expression FRE(-1).
Besides the Step and Watch capabilities, there are two additional
features you should understand: Break Points and Watch Points. When a
program is very large and complex, it becomes impractical to step and trace
through every statement. Also, in some cases you may not know at which
statement an error is occurring.
Pressing F9 sets up a Break Point which tells BASIC to halt when it
reaches that point in the program, regardless of how it arrived there. You
can have multiple break points, and the program will run normally until the
specified statement is about to be executed. Simply place the cursor on
the line at which the program is to stop, and press F9. That line will be
highlighted to show that it is currently a Break Point. Pressing F9 again
removes the Break Point.
A Watch Point tells BASIC to execute the program, until a certain
condition becomes true. Some examples of Watch Points are X% = 100,
ABS(Total#) > 1000, and FRE("") < 1000. In the first example you are
telling BASIC to stop the program and return to the editor when X% equals
100. The second example will stop the program when the absolute value of
Total# exceeds 1000, and the third halts it when there are less than 1000
bytes of string space remaining.
Considered together, these debugging features are extremely powerful.
You can tell BASIC, in effect, "Run until the value of Count% hits 14; then
stop the program, and let me walk backwards through the program to see how
that happened."
USING /D TO DETECT ERRORS
Another very powerful debugging solution at your disposal is to compile
your program with the /d debug option. When creating an .EXE file in the
BASIC environment from the Run menu, you would select the "Produce debug
code" option. Compiling with /d tells BC to add three important safeguards
to the code it generates. Some of these debugging issues were described in
Chapter 1, but they deserve elaboration here.
The first code addition is a call to a central event handler prior to
every BASIC program statement, to detect if Ctrl-Break was pressed.
Normally, a compiled BASIC program is immune from pressing Ctrl-Break and
Ctrl-C, unless the program is processing an INPUT statement. BASIC adds
break checking to let you get out of an endless loop or other similar
situation, without having to reboot your computer.
The second addition is an overflow test following each integer and
long integer addition, subtraction, and multiplication, to detect results
that exceed the range of legal values. If you have a statement such as X%
= Y% * Z% and the result after multiplying is greater than 32767, the
overflow test will detect that and produce an error message. Otherwise, X%
would be assigned an erroneous value and your program would have no way to
detect it. Floating point operations do not need any additional testing,
because overflows are detected and reported whether or not /d is used.
The last additional code that BASIC adds when /d is used is array
element bounds checking. If you have dimensioned an array and attempt to
assign an element that doesn't exist, a compiled BASIC program will
normally ignore the error. For example, if an array has been dimensioned
using DIM Array%(1 TO 100) and you then have the statement Array%(200) =
12, BASIC will store the value 12 at what would have been the 200th
element. This can lead to disastrous consequences such as overwriting an
element in another array, or corrupting string memory. When /d is used
BASIC adds additional code to check every array element referenced, and
reports an error if that element does not exist.
Because of the added checking for overflow errors and illegal element
numbers, a program compiled with /d will be larger and run more slowly than
one in which /d is not used. Therefore, you should not release a program
for general use that has been compiled with the debug option. One
exception worth noting is that QuickBASIC versions 4.0 and 4.5 contain a
bug that generates incorrect code for certain long integer array
operations. The only solution when that happens is to use /d. This way,
the routine that calculates element addresses and checks for illegal
element numbers is used, rather than the incorrect in-line code that BC
produces directly.
You could also compile with the /ah (huge array) switch, which uses
the same routine to calculate and check array element addresses. Using /ah
has an advantage over /d in this case, because your program will not be
halted if Ctrl-Break is pressed. Using /ah also avoids the extra code and
time to check for overflow errors. However, /ah affects dynamic arrays
only, and errors with static arrays will not be prevented.
When a program is run in the BASIC editor, the same protection that /d
provides is employed. This added debug testing within the editor is one
more contributor to its slowness when compared to a fully compiled program.
ADVANCED DEBUGGING
Although being able to step through your program and watch its variables in
the BASIC editing environment is very powerful, there are still some
limitations inherent in that process. For example, it is possible that a
program will work perfectly in the editor, but not when it has been
compiled to an .EXE program. Microsoft has tried to make the BASIC editor
as compatible with BC as possible, but the editor is an interpreter and not
a true compiler. There are bound to be some differences in how the program
runs. Another limitation is that some programs are just too large to be
run within the editor. Finally, if you receive an error message from an
executable program that lists only a segment and address, there is no way
to determine where the error occurred using the editor.
In these cases you will need to work with the actual compiled program.
To relate an error address to the original BASIC source statement you must
be able to see the assembly language code that BC generates, along with the
original BASIC source. One way to do this is with the Microsoft CodeView
debugger. CodeView comes with BASIC PDS [and VB/DOS Professional Edition]
as well as with Microsoft's Macro Assembler. CodeView provides a debugging
environment that is similar to the QB editor, except it is intended for
tracing through a program that has already been compiled.
Another way is to instruct BC to generate an assembly language source
listing as it compiles your program. This listing shows a mix of BASIC
source statements and the resultant assembly language code and addresses.
However, the listing is not as clear or easy to follow as the display that
CodeView presents. But if you do not have CodeView, this is your only
choice. I will describe this method first.
CREATING AN ASSEMBLY LANGUAGE SOURCE LISTING
To create an assembly language list file you use the compiler's /a switch,
and then specify a list file name. The syntax is shown below, followed by
a sample list file that is generated.
You enter this:
bc program /a [/other options] , , listfile;
LISTFILE.LST contains this:
PAGE 1
25 June 91
14:28:08
Microsoft (R) QuickBASIC Compiler Version 4.50
Offset Data Source Line
0030 0006 CLS
0030 0006 INPUT Count%
0030 ** I00002: mov ax,0FFFFh
0033 ** push ax
0034 ** call B$SCLS
0039 ** mov ax,offset <const>
003C ** push ax
003D ** call 0000h
0040 ** pop ax
0041 ** add ax,000Dh
0044 ** push cs
0045 ** push ax
0046 ** call B$INPP
004B ** jmp $+04h
004D ** dw 0002h
004F ** db 00h
0050 ** db 02h
0051 ** mov bx,offset COUNT%
0054 ** push ds
0055 ** pop es
0056 ** push es
0057 ** push bx
0058 ** call B$RDI2
005D 0008 IF Count% < 100 THEN
005D 0008 Count% = 100
005D 0008 END IF
005D ** call B$PEOS
0062 ** cmp word ptr COUNT%,64h
0067 ** jl $+03h
0069 ** jmp I00003
006C ** mov COUNT%,0064h
0072 0008 PRINT Count%
0072 0008 END
0072 0008
0072 0008
0072 ** I00003: push COUNT%
0076 ** call B$PEI2
007B ** call B$CEND
0080 ** call B$CENP
0085 0008
43981 Bytes Available
43643 Bytes Free
0 Warning Error(s)
0 Severe Error(s)
Here, the list file shows the original BASIC source code, as well as the
generated assembly language instructions. The column at the left holds the
code addresses, and these correspond to the addresses that BASIC displays
when a program crashes with an error message. Unfortunately, several BASIC
statements are grouped together, so it is not immediately apparent which
address goes with which source statement. For example, after the BASIC
statement INPUT Count%, the earlier assembly language instructions that
clear the screen are shown. Similarly, the call to B$PEOS is actually part
of the INPUT code, although it is listed following the IF test.
When BASIC displays an error message and ends your program by
displaying a segmented address, only the address portion is meaningful.
The segment in which a program is running will depend on many factors,
including the DOS version (and thus its size), the FILES= and BUFFERS=
values specified in CONFIG.SYS, and whether TSR programs and device drivers
are loaded. Each of these factors cause the program to be loaded at a
higher segment, although the addresses within that segment never change.
Also, in a multi-module program, a different segment is used for each
module's source file. Therefore, if the message is "Illegal function call
in module XYZ at address 3456:1234", you would compile XYZ.BAS to create a
list file instead of the main program. The code in the vicinity of address
1234 will be where the error occurred.
USING MICROSOFT CODEVIEW
Although compiling with the /a switch lets you view the assembly language
code that BASIC creates, there is little you can actually do with that
information. CodeView is a much more powerful debugging tool, and it lets
you step through an .EXE file as it is running. This lets you follow the
compiled program's execution path, and also view its assembly language
instructions. Further, CodeView can trace into BASIC's library routines,
as well as calls to C or assembly language routines that you have written.
CodeView can also be used to see how many bytes of code are generated
for each BASIC statement. This is a good way to compare the relative
efficiency of different programming methods, to see which ones produce less
code. It is important to understand that the size of the assembly language
code generated for a given BASIC statement is a combination of two factors:
the number of bytes the compiler generates for each occurrence of the
statement, and the size of the called routine within BASIC's runtime
library. Of course, the called routine is added to your program only once.
However, the code that sets up and calls the routine is added each time the
statement is encountered.
Compiling a program for use with CodeView is very simple, and merely
requires the addition of special compiler and linker option switches. Note
that you cannot compile a program for CodeView from within the QuickBASIC
editor; you must compile and link manually from the DOS command line, as
shown below. Also notice that the BASIC program must be saved as ASCII
text, and not with the special "Fast Load" method that QB optionally uses.
bc program /zi [/other options];
link program /co [/other options];
cv program
The /zi option tells BC to write additional information into the object
file, which is used by LINK and CodeView to relate each line of BASIC
source code to its resultant assembly code. The more meaningfully named
/co switch is required so LINK will know to do likewise. You may be
interested to know that /zi is named after Microsoft legend Mark
Zibikowski, whose initials (MZ) also appear as the first two bytes in every
DOS .EXE file.
Once the program has been compiled and linked, start CodeView by
entering CV followed by the file's first name (that is, without the .BAS or
.EXE extension). You will then be presented with a screen very similar to
that of the QB editor. Most versions of CodeView initially show the BASIC
source code. In other versions, you must press Alt-R-R to "restart" the
program and bring it to the first source line. I should point out that
CodeView is a quirky program, and it is often referred to as the program
that people "love to hate". It has some glaring omissions, many aspects of
its interface are inconsistent and downright obnoxious, and I personally
would be lost without it.
When the BASIC source is displayed, you may press F4, F7, F8, and F10,
which perform the same functions as their BASIC editor counterparts. One
important difference, however, is that you may also press F3 to show a mix
of BASIC and assembly language code. Stepping through the program with F8
and F10 will execute either a single BASIC statement or a single assembler
command, depending on the context. That is, if you are in the BASIC view
mode, then you will step through the BASIC code. If the assembly language
code is being displayed, then you will step through that instead.
Figure 4-1 [not available here, sorry] shows a screen snapshot of a
short sample program as displayed by CodeView when it is first started in
the BASIC view mode. Figure 4-2 [also unavailable] shows the same program
after pressing F10 to execute up to the first statement, followed by F3 to
view a mix of BASIC and assembly language. This screen is in a 50-line
mode to allow the entire program to be displayed. Although it is not shown
here, CodeView can continuously display the processor's registers in a
small window at the right side of the screen. The register display is
alternately activated and deactivated by pressing F2.
FIG4-1: The CodeView display when using the BASIC view mode.
FIG4-2: The CodeView display for the same program, but using the assembly
language view mode.
Notice in Figure 4-2 that CodeView displays each BASIC statement indented
and with a line number. This lets you identify where each BASIC command
starts, and also which block of assembly language code it is associated
with. The numbers at the left edge of the display show the segment and
address of each instruction in hexadecimal notation. The segment value
never changes within a single program module, although the addresses
increase based on the number of bytes in each assembly language
instruction. As you can see, some assembly language commands are as short
as one byte, and others are as long as six.
In the first instruction, CLS, a value of -1 (FFFF hex) is passed to
the CLS routine as a flag to show that no argument was given. Had the
BASIC statement been CLS 2, then a value of 2 would have been moved into AX
instead. Nine bytes of code are generated each time CLS is used, not
counting the code within B$SCLS. Besides showing the B$SCLS routine name,
CodeView also shows the segment and address at which B$SCLS resides.
Knowing the routine's address is of little practical use in this situation,
and it is displayed solely for informational purposes.
The INPUT statement is fairly complicated to set up, and I won't
belabor what every assembly language instruction does. But several items
are worth discussing. The first is that CodeView attempts to relate every
number it encounters to a variable or procedure address. In many cases
this is confusing, because some numbers are simply that, and have no
relationship to a variable or procedure address.
For example, at address 39 the assembly language command MOV AX,40 is
shown as MOV AX,b$STRTAB_END+10 (0040), as if there was some significance
to the fact that the value 40 is an address ten bytes past the end of an
internal string table. Likewise, two instructions later the value 40 is
represented as being 31 bytes past the beginning of the B$LENDRW procedure.
Two instructions past that the value 13 (0D hex) is added to AX, and again
CodeView tries to establish a significance where none exists.
In not one of these cases are the values shown related to the named
address, and you should therefore treat those named labels with skepticism.
The only symbolic names that are meaningful in most cases are variable and
procedure names that do not have an extra value added to them. In the
instruction MOV Word Ptr [COUNT% (0036)],b$HEAP_FIRST (0064) at address 6C,
the address for Count% (36) is valid, while the value 64 named b$HEAP_FIRST
is meaningless. In this case, 64 hex represents the value 100 in the BASIC
statement Count% = 100. Whatever b$HEAP_FIRST may represent, it has no
meaning here.
I suggest that you enter this short program and then step through it
one statement at a time, just to get a feel for how CodeView operates. You
should also try tracing into some of the BASIC library calls, as well as
into a simple subprogram or two of your own. Again, you may use either F10
or F8 to step through the code, but only F8 will trace into code that is
being called. You can also use F8 to trace into some BIOS interrupts, but
you should never try to trace through a DOS interrupt (21 hex). Many DOS
services never return, or return in a non-standard manner, and a locked-up
PC is the likely result. You will not hurt anything if you do trace into a
DOS interrupt, but be prepared to press Ctrl-Alt-Del.
Besides being able to view and step through the assembly language code
that BASIC creates, you can also view and modify your program's data
directly. If you have pressed F2 to display the CPU's registers, CodeView
will show the value currently in every memory address that is about to be
accessed. For example, if the next statement to be executed is MOV Word
Ptr [COUNT%],10, CodeView will show the current contents of the variable
COUNT%.
A range of memory addresses may be displayed by entering commands into
the immediate window at the bottom of the screen. When CodeView is first
started, the cursor is placed at the bottom line in that window. As with
the BASIC editor, the F6 key is used to toggle between the code output and
immediate windows. Unlike the BASIC editor, however, you may type commands
regardless of which window is active.
The three primary commands you will find useful are D, U, and R. The
D (Dump) command tells CodeView to display a range of memory, starting at a
given address. For example, D 0 means to show the 32 bytes that start at
address 0 in the default data segment. Likewise, D ES:100 means to start
at address 100 in the segment held in the ES register. Unfortunately,
CodeView is particularly obtuse in this regard, because in some cases the
numbers you enter are assumed to be decimal while in others it assumes
hexadecimal. Which is which depends on your view perspective (selected
with F3), and I won't even begin to offer a reason or explain the confusing
rules. If you don't get what you expect, try adding an "&H" prefix to the
number. And if you start by using &H and CodeView reports a syntax error,
then try it without the &H.
When the contents of memory are displayed, they are shown as
individual bytes, rather than as integer words which is generally more
useful. In the listing below, two string constants have been displayed in
response to the command D &H40. For space reasons, the segment and address
which CodeView adds to the left of each row of values are instead shown
above the rows.
>D &H40
5676:0040
02 00 44 00 48 69 23 00 4A 00 41 42 43 44 45 46
5676:0050
47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56
As you learned in Chapter 2, BASIC near strings have a 4-byte descriptor,
with the first two bytes holding the string's current length, and the
second two bytes its current address. Beginning with the first two numbers
displayed, the 02 00 represents the length of a 2-character string, and the
44 00 indicates the address which is 44. The data itself is a CHR$(&H48)
followed by a CHR$(&H61) ("Hi"), and it immediately follows the string
descriptor. When two bytes are used to store an integer word, the least
significant byte is kept in the lower memory address. Therefore, the value
0002 is actually listed as 02 00 (CodeView adds an extra blank between
bytes for clarity).
Immediately following the six bytes for the string "Hi" and its
descriptor is another descriptor. This one shows that the string has a
length of 23 Hex bytes, and its data starts at address 4A Hex. Again, the
value 0023 is shown as 23 00, and the address 004A is displayed as 4A 00.
This string contains the data "ABCDEFGHIJKLMNOPQRSTUV".
The U (Unassemble) command can be used to show the assembly language
source code at any arbitrary segment and address. The command U 2000:1000
will unassemble the code at address 2000:1000, though again you may need to
use U &H2000:&H1000 in some view modes. The U command is not used that
frequently, since CodeView is used most often to step through code in
sequence, rather than to examine an arbitrary block of instructions.
The R command lets you change the contents of a register, and this
might be useful when debugging your own assembly language subroutines.
When you type, for example, RCX and press Enter, the current value of the
CX register is displayed and you are prompted for a new value. Pressing
Enter alone cancels the command and leaves the current register contents
intact. Otherwise, the value you enter will be assigned to CX. This is
similar to BASIC's immediate window, in which you can assign new values to
a variable.
The last CodeView features worth describing here are Watch Variables
and Watch Points, which are similar to the same features in QB. Unlike QB,
though, you cannot use an expression as the target of a Watch; it must be a
simple variable name, array element, or address. Watch Variables may be
added using the pull-down menu, or by pressing Alt-W and then typing the
variable name. If you are in the BASIC view mode you may add only BASIC
variables; in the assembly language view mode you can add only assembly
language variables. To monitor the contents of a memory address requires
the W command. For example, W 40 will set up address 40 as the target of a
Watch.
Although CodeView does support Watch points, whereby the program will
run continuously until a given expression is true, you won't want to use
that feature. Asking CodeView to stop when, say, CX becomes greater than
100 will cause your program to run at less than one thousandth its normal
speed. Therefore, I have never found using Watch Points effective in any
situation--it is always too slow.
I have avoided discussing the latest versions of CodeView, in favor of
focusing on those features which are common to all versions. CodeView 3.10
which is included with BASIC 7.1 has several new convenience features, and
a few new bugs as well. Many of the commands that in earlier versions have
to be entered manually are now available by simply typing new values onto
the display. For instance, where older versions of CodeView required you
to enter Dump commands repeatedly, the new version updates the displayed
values in a range of addresses constantly. And to change the address
range, you may now simply move the cursor to the segment and address
numbers and type new ones. An option to display memory values as words or
even single and double precision values is also present in version 3.10.
Now that you have seen what CodeView is all about and how to use it, I
want to conclude this chapter with a practical example. As I mentioned in
Chapter 3, the amount of stack memory that is needed in a non-static
subprogram or function can be difficult to determine. The calculation
itself is trivial: simply add up the number of bytes needed by every
variable in the routine. Each integer requires two bytes, single
precision, long integer, and string variables need four bytes, and so
forth. The problem, of course, is who wants to do all that counting,
especially when there may be hundreds of variables. Counting is what
computers are for, no?
The solution is that BASIC knows how many bytes are needed for the
subprogram, and the very first thing a subprogram does when it is invoked
is to call another routine that allocates the necessary stack space. So
rather than use trial and error methods to increase the stack in small
increments, you can use CodeView to directly see how many bytes of stack
space are being requested. Here's how that's done, using the example
program shown below.
DEFINT A-Z
DECLARE SUB StackTest (Dummy)
Test = 10
CALL StackTest(Test)
END
SUB StackTest(AnyVar)
X = 100
Y = 10
Z = AnyVar
END SUB
Save this program as an ASCII file using the name TEST.BAS, and then
compile it with the /o and /zi options. Next, link TEST.OBJ for CodeView
using the /co option. Then start CodeView by entering CV TEST. Once you
are in CodeView and viewing the BASIC source, press F10 to skip past
BASIC's start-up code. At this point the cursor should be on the first
statement, Test = 10. Finally, press F3 to show a mix of BASIC and
assembly language source code. The display should look similar to that
shown in Figure 4-3 [unavailable].
FIG4-3: How to determine the amount of stack memory needed for a non-static
procedure.
Notice the first statement within the TestStack subprogram at line 7, where
the value 6 (erroneously labeled b$STRTAB+6) is assigned to the CX
register. This is the number of bytes of stack space being requested from
the B$ENRA routine which is called in the next instruction. B$ENRA is the
routine that actually allocates the stack memory, and it uses the value
BASIC sends in CX to know how many bytes are needed. TestStack has three
local variables and each is a two-byte integer, hence six bytes are
required to store them on the stack.
For a very large program, the value assigned to CX will of course be
much larger. Further, if one subprogram calls another, it will be up to
you to add up all of the CX values to determine the total stack memory
requirements. But this is very much easier than counting variables.
SUMMARY
In this chapter you have learned how to identify and correct common
programming errors. You have also learned the importance of understanding
BASIC's various quirks, and how some statements do not always do exactly
what you thought they would. I have shown several debugging strategies,
including a software adaptation of the "cut in half" hardware technique.
Perhaps your most powerful debugging ally is the QuickBASIC and QBX
editing environments. These powerful editors let you single step through a
program, monitor variable values and function results, and halt your
program when a specified condition occurs.
When BASIC terminates a program prematurely with an error message and
a segmented address, you can either use the BC compiler's /a option to
generate a source listing, or use CodeView to see where the error occurred.
CodeView can also be used to step and trace through a program at the
assembly language source level, and to determine the number of bytes of
stack memory a non-static procedure requires.
In Chapter 5 you will learn about compiling and linking BASIC
programs. I will present a complete overview of the many BC and LINK
options that are available, and discuss the relative merits of each.